Exploring Stereotypes and Biased Data with the Crowd

نویسندگان

  • Zeyuan Hu
  • Julia Strout
چکیده

In 2016, Baidu and Google spent somewhere between twenty and thirty billion dollars developing and acquiring artificial intelligence and machine learning technologies (Bughin et al. 2017). A range of other sectors, including health care, education, and manufacturing, are also predicted to adopt these technologies at increasing rates. Machine learning and AI are proven to have the capacity to greatly improve lives and spur innovation. However, as society becomes increasingly dependent on these technologies, it is crucial that we acknowledge some of the dangers, including the capacity for these algorithms to absorb and amplify harmful cultural biases. Algorithms are often praised for their objectivity, but machine learning algorithms have increasingly made news for a number of problematic outcomes, ranging from Google Photos incorrectly classifying African Americans as gorillas to the judicial system using algorithms that are biased against African Americans (Dougherty 2015; Angwin et al. 2016). These harmful outcomes can be traced back to the data that was used to train the models. Machine learning applications put a heavy premium on data quantity. Research communities generally believe that the more training data there is, the better the learning outcome of the models will be (Halevy, Norvig, and Pereira 2009). This has led to large scale data collection. However, unless extra care is taken by the researchers, these large data sets will often contain bias that can profoundly change the learning outcome. Even minimal bias within a data set can end up being amplified by machine learning models, leading to skewed results. Researchers have found that widely used image data sets imSitu and MSCOCO, along with textual data sets mined from Google News, contain significant gender bias (Zhao et al. 2017; Bolukbasi et al. 2016). This research also found that training models with this data amplified the bias in the final outcomes. Once these algorithms have been improperly trained they can then be implemented into feedback loops where systems “define their own reality and use it to justify their results” as

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Data-driven Method for Crowd Simulation using a Holonification Model

In this paper, we present a data-driven method for crowd simulation with holonification model. With this extra module, the accuracy of simulation will increase and it generates more realistic behaviors of agents. First, we show how to use the concept of holon in crowd simulation and how effective it is. For this reason, we use simple rules for holonification. Using real-world data, we model the...

متن کامل

Exploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels

The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...

متن کامل

Subjective Health: The Roles of Communication, Language, Aging, Stereotypes, and Culture

A consensually-agreed position among scholars of communication and aging is that while psychological and physical health mutually impact each other, the quality of language to and from older adult individuals shape each of these—and are shaped by them. Encounters with others inside and outside of one’s age ingroup involve stereotyped expectations with regard to language and other speec...

متن کامل

National Identity and Ethnic Identity in the Lifeworld of Baloch Students

The purpose of the present research is exploring qualitatively, national and ethnic identity in the lifeworld of Baloch student at state universities. In this research, grounded theory has been used as one of the qualitative research methods. The research participants were Masters and Ph.D. Baloch students at state universities. Datas are collected through in-depth interviews. Interviews contin...

متن کامل

Digital Art and Crowd Creation in Iran (Case Study: Tehran Annual Digital Art Exhibition)

This paper aims to show the status of digital art in Iran and explain how the meaning of an artist has transformed in the digital age. The primary assumption of this paper is that the experience of digital art has again revived the collective experience in creating arts. Although, interactivity is considered to be the most important quality of digital art, their collective, collaborative and pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.03261  شماره 

صفحات  -

تاریخ انتشار 2018